DPO cleanup #1126

winglian · 2024-01-16T00:50:50Z

Description

This PR cleans up some hardcoding, improves the integration with trl's DPOTrainer and adds support for dpo prompt_strategies.

src/axolotl/utils/data.py

plaguss

Awesome PR! I left a comment in case you see fit. Also, maybe it could be tackled in a different PR, but the preprocess command could also be updated to allow checking rl datasets:

+    if parsed_cfg.rl:
+        _ = load_rl_datasets(cfg=parsed_cfg, cli_args=parsed_cli_args)
+    else:
+        _ = load_datasets(cfg=parsed_cfg, cli_args=parsed_cli_args)
-    _ = load_datasets(cfg=parsed_cfg, cli_args=parsed_cli_args)

src/axolotl/utils/data.py

filippo82 · 2024-01-23T02:31:09Z

src/axolotl/prompt_strategies/dpo/__init__.py

+
+def load(strategy, cfg):
+    try:
+        load_fn = strategy.split(".")[-1]


This is most likely not correct. The strategy includes underscores, not ., such as intel_apply_chatml.

def load(strategy, cfg): try: load_fn = strategy.split("_")[-1] #strategy = ".".join(strategy.split("_")[:-1]) LOG.info(load_fn) LOG.info(strategy) mod = importlib.import_module(f".{load_fn}", "axolotl.prompt_strategies.dpo") func = getattr(mod, strategy) load_kwargs = {} return func(cfg, **load_kwargs) except Exception as e: # pylint: disable=broad-exception-caught LOG.warning(e) return None

This works for me.

the intention is the setting is something like

type: chatml.argilla

in which case it will load the argilla function from the axolotl.prompt_strategies.dpo.chatml module.

Hi @winglian 👋🏻 thanks. That makes sense. I will test it later today 👍🏻

src/axolotl/prompt_strategies/dpo/__init__.py

Co-authored-by: Agus <[email protected]>

* cleanup dpo to be a little more extensible, add zephyr/nectar strategy * fix eos slash * support for eval split * fix kwargs * handle empty evals * don't load peft model for dpo * ensure dpo traning args gets bf16 for peft if applicable * fix duplicate kwargs for bf16 * make sure to respect the configured lr scheduler * supprt trainer callback to push config to wandb * set dataloader preload args * ensure that we are loading the lora when merging * Update src/axolotl/utils/data.py Co-authored-by: Agus <[email protected]> * support local datasets for dpo Co-authored-by: Agus <[email protected]> * chore: lint * dpo/kto/ipo smoke tests w lora, simplify dpo dataset type names * add split to dpo tests * fix rebase/merging error * handle edge case w logging * use accelerator for dpo datasets so it doesn't break the logger * missing args * validate checkpoint is an adapter for now * log warning when dataset strategy is not loadable --------- Co-authored-by: Agus <[email protected]>

JohanWork reviewed Jan 16, 2024

View reviewed changes

src/axolotl/utils/data.py Outdated Show resolved Hide resolved

plaguss reviewed Jan 18, 2024

View reviewed changes

src/axolotl/utils/data.py Outdated Show resolved Hide resolved

plaguss reviewed Jan 18, 2024

View reviewed changes

src/axolotl/utils/data.py Outdated Show resolved Hide resolved

winglian force-pushed the dpo-cleanup branch 2 times, most recently from d5f97c3 to c0a1553 Compare January 23, 2024 02:21

filippo82 reviewed Jan 23, 2024

View reviewed changes

src/axolotl/prompt_strategies/dpo/__init__.py Show resolved Hide resolved

winglian and others added 17 commits January 22, 2024 21:44

cleanup dpo to be a little more extensible, add zephyr/nectar strategy

6a23720

fix eos slash

2f7242d

support for eval split

74cbe2a

fix kwargs

1f28327

handle empty evals

cb2f774

don't load peft model for dpo

a91e0cb

ensure dpo traning args gets bf16 for peft if applicable

bcae290

fix duplicate kwargs for bf16

a24c756

make sure to respect the configured lr scheduler

d5e12dd

supprt trainer callback to push config to wandb

9a746fa

set dataloader preload args

60f566c

ensure that we are loading the lora when merging

c41391d

Update src/axolotl/utils/data.py

1cfd179

Co-authored-by: Agus <[email protected]>

support local datasets for dpo

02fc8f9

Co-authored-by: Agus <[email protected]>

chore: lint

7141fd1

dpo/kto/ipo smoke tests w lora, simplify dpo dataset type names

44a6f2d

add split to dpo tests

7f3b7ce

winglian force-pushed the dpo-cleanup branch from c0a1553 to 7f3b7ce Compare January 23, 2024 02:51

winglian added 4 commits January 22, 2024 22:31

fix rebase/merging error

52a227d

handle edge case w logging

064b20e

use accelerator for dpo datasets so it doesn't break the logger

c49315f

missing args

72fb877

winglian added 2 commits January 23, 2024 00:10

validate checkpoint is an adapter for now

29663d8

log warning when dataset strategy is not loadable

76b5c2d

winglian merged commit 7523d1f into main Jan 23, 2024
1 of 6 checks passed

winglian deleted the dpo-cleanup branch January 23, 2024 05:40

JohanWork mentioned this pull request Jan 25, 2024

WIP Add:adding smoke test dpo #1201

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPO cleanup #1126

DPO cleanup #1126

winglian commented Jan 16, 2024

plaguss left a comment •

edited

Loading

filippo82 Jan 23, 2024

filippo82 Jan 23, 2024 •

edited

Loading

filippo82 Jan 23, 2024

winglian Jan 23, 2024

filippo82 Jan 23, 2024

DPO cleanup #1126

DPO cleanup #1126

Conversation

winglian commented Jan 16, 2024

Description

plaguss left a comment • edited Loading

Choose a reason for hiding this comment

filippo82 Jan 23, 2024

Choose a reason for hiding this comment

filippo82 Jan 23, 2024 • edited Loading

Choose a reason for hiding this comment

filippo82 Jan 23, 2024

Choose a reason for hiding this comment

winglian Jan 23, 2024

Choose a reason for hiding this comment

filippo82 Jan 23, 2024

Choose a reason for hiding this comment

plaguss left a comment •

edited

Loading

filippo82 Jan 23, 2024 •

edited

Loading